Mining Tandem Mass Spectral Data to Develop a More Accurate Mass Error Model for Peptide Identification
نویسندگان
چکیده
The assumption on the mass error distribution of fragment ions plays a crucial role in peptide identification by tandem mass spectra. Previous mass error models are the simplistic uniform or normal distribution with empirically set parameter values. In this paper, we propose a more accurate mass error model, namely conditional normal model, and an iterative parameter learning algorithm. The new model is based on two important observations on the mass error distribution, i.e. the linearity between the mean of mass error and the ion mass, and the log-log linearity between the standard deviation of mass error and the peak intensity. To our knowledge, the latter quantitative relationship has never been reported before. Experimental results demonstrate the effectiveness of our approach in accurately quantifying the mass error distribution and the ability of the new model to improve the accuracy of peptide identification.
منابع مشابه
Development and Evaluation of Methods for Predicting Protein Levels and Peak Intensities from Tandem Mass Spectrometry Data
Tandem mass spectrometry (MS/MS) of peptides is a central technology for proteomics, enabling the identification of thousands of proteins and peptides from a complex mixture. With the increasing acquisition rate of tandem mass spectrometers, it has become possible to use data-mining techniques to attempt to solve important biological problems using MS/MS data. These problems include (i) estimat...
متن کاملData Mining in Protein Identification by Tandem Mass Spectrometry
Protein identification (sequencing) by tandem mass spectrometry is a fundamental technique for proteomics which studies structures and functions of proteins in large scale and acts as a complement to genomics. Analysis and interpretation of vast amounts of spectral data generated in proteomics experiments present unprecedented challenges and opportunities for data mining in areas such as data p...
متن کاملSpectral profiles, a novel representation of tandem mass spectra and their applications for de novo peptide sequencing and identification.
Despite many efforts in the last decade, the progress in de novo peptide sequencing has been slow with only 30-45% of all peptides correctly reconstructed. We argue that accurate full-length peptide sequencing may be an unattainable goal for some spectra and demonstrate how to accurately sequence gapped peptides instead. We further argue that gapped peptides are nearly as useful as full-length ...
متن کاملPepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden M...
متن کاملA Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search
An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2007